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1. INTRODUCTION 

Today world is very well connected and there are so many technologies emerging every day. Human 
beings are depending a lot on machines, robot and computers to mitigate their workload, and to make the 
tasks quick, easy and effective. In fact, artificial intelligence and machine learning have revolutionized the 
world because they provide users with intelligent systems and interfaces in many domains including civil 
engineering [1], [2], education [3]-[7], tourism [8]-[10], medicine [11]-[15] and also justice. 

Morocco, like many developing countries, is currently converting from paper to digitization. This 
digital transformation is going ahead, but not at full speed. The country’s Justice courts are among the public 
sector institutions that are aware of digital transformations. Almost 80,000 of road accident in Morocco in 
each year. It is obvious that victims of those accidents will wait for a kind of compensation. The courts in 
Morocco are still working in a traditional way and really put a huge psychological and mental pressure on 
those victims. Implement artificial Intelligence and machine learning in the field of justice will surely create 
a positive environment and make the legal cases be solved quickly and effectively. 

This work’s primary motivation is taking advantage of intelligence artificial to develop decisionmaking 
systems and facilitate such laborious tasks for different parts of justice. To this end, we have compiled and 
processed the Errachidia court data to build a model to predict the outcome of accident cases. We trained and 
evaluated the performance of three different machine learning algorithms namely linear regression, decision tree, 
and random forests. In addition to the introduction, this work is divided into four main sections. In section 2, we 
provide an overview of relevant works related to the topic of the paper. Section 3 describes the methodology 
followed to process the collected dataset. Further, we introduce the three machine learning algorithms implemented 
in this study. In section 4, we present the results obtained. Finally, we conclude this paper in section 5. 
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2. RELATED WORK 

Generally, processing legal data to automatically retrieve valuable information has a vital role in the 
legal field [16]. Numerous works have dealt with legal data and the use of information technology in the field 
of justice. However, working on legal Arabic textual data is little compared to other languages especially if 
those data are from Moroccan justice institutions. To the best of our knowledge, very limited studies have 
been worked on similar data in Morocco (e.g., [17] and [18]). However, we built the first of its kind 
predictive model that predicts outcomes of legal cases in Moroccan courts. In this section, we shed light on 
relevant studies related to the topic of the paper. 

A study, conducted by Medvedeva et al. [19], aimed to automatically process the court proceedings 
to predict the likely verdict of new cases. In this study, the data was compiled from the European Court of 
Human Rights (ECtHR), which is publicly available. The support vector machine (SVM) linear classifier was 
used as a supervised machine learning that has been trained on many court cases associated with their 
judgments. Besides, the model was trained to predict only two verdicts, violation vs. no violation. The 
prediction model was evaluated using 9 articles of the ECtHR and the accuracy achieved was 75%. 

Wu et al. [20] attempted to recognize predictors of drug court graduation among amphetamineusing 
participants. To this end, the database used includes data of 540 participants where 341 are 
amphetamineusing. Moreover, the study used multivariate binary regression as predictive models, where chi- 
square and t-tests are performed to compare the outcomes and amphetamine-use groups. The results reported 
that having kids and the interaction of using amphetamine and being employed were predictive of graduation. 

The use of artificial intelligence and machine learning techniques is increasing in the public sector. 
For instance, Lima and Delen [21] conducted a study to predict corruption-related issues inside government 
institutions. The study worked on massive datasets collected from several sources like the human 
development reports of the United Nations Development program. The final database comprised of 
information on 117 variables across 132 nations from different world regions (Americas, Asia Pacific, 
Europe, Central Asia, Middle East, North Africa, and Sub-Saharan Africa). Then, predictive models were 
built using popular machine learning algorithms namely SVM, artificial neural networks (ANN), and random 
forests. The results revealed that the random forests achieved the highest accuracy 85.77%, followed by SVM 
with 76.15%, and ANN with 73.84%. 

Metsker et al. [22] dealt with Russian court decisions using machine learning. They used Spark for 
data processing and decision trees for analysis. They developed methods of extracting and structuring 
knowledge taking into account the specificities of the legislation of the Russian Federation. 

Bozkir and Sezer [23] were able to predict the actual consumption in food demand. They did that 
using three main methods (CART, CHAID, and Microsoft Decision Trees). The study was done in food 
courts of Hacettepe University. And prediction accuracy was up to 0.83 in R2. 

According to the study [24] they tried to implement technology in the legal practices. Because the 
traditional way of doing things, the legal practices takes so much time and procedures. So, the study 
developed what they call “virtual courtroom” to solve cases and problem that might occur. 

Gomes et al. [25] investigated the effects of investments in information and communication 
technologies on the productivity of courts in Brazil. Organizations in the Brazilian justice system are seeking 
solutions to many of the challenges, such as limited access to justice and delays in resolving cases. The 
results confirm four of five hypotheses, showing that investment in information and communication 
technologies has a direct positive effect on the productivity of courts in Brazil. 

Public data, once processed, can be used to improve society and policy making, but personal 
information must be removed from the data. Sharafat et al. [26] made an intelligent system to extract 
personal information from legal documents before mining them in searches. To automatically extract these 
entities, the first requirement is to construct a dataset using legal judgments. Thus, annotation guidelines are 
first prepared, followed by the preparation of an annotated dataset for extraction of various legal entities. 
Experiments with various datasets, several algorithms, and annotation schemes resulted in a maximum F1 
score of 91.51% using conditional random fields. 


3. METHOD 

In this section, we describe our methodology which is composed of the following steps. First, 
extract the most representative characteristics. Second, divide the collected data into two parts (training, test). 
Third, start the learning process for each algorithm and build a model, and finally, test and evaluate each 
model built. Figure 1 describe the workflow of our research methodology. 
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Figure 1. Research method 


3.1. Description of the dataset 

The dataset includes the judgments of Errachidia court in Morocco. The cases treated are accident 
cases. The judgments selected are from 2017 to 2019. Note that these judgments are written in Arabic 
because it is the official language of the courts in Morocco. Although the number of Arabic corpora available 
[6], none of them are similar to this dataset. Therefore, we believe that the current version and all updates to 
this dataset will be a valuable resource for computational linguistics and natural language processing 
alongside machine learning applications such as decision making. 


3.2. Identification of characteristics 

After having analyzed numerous judgments and treated the subject with different parts of the field 
as well as reviewing the codes and jurisprudence that organizes this type of cases. We have come to identify 
the most important characteristics that control the results of these judgments. These characteristics are 
presented in Table 1. 


Table 1. Description of characteristics 


Charateristic Description 
1 Age Age of the victim 
2 Job Employed, student, Not employed 
3 Salary Salary if he has a job, if not we use the minimum salary 
4 Partial disability ratio Ratio defined by expert 
5 Total defict ratio Ratio defined by expert 
6 Physical pain No important, Important... 
7 Distoration of congential No important, Important... 


There are different criteria that control the prediction of the amount of compensation given to 
accident victims such as age, salary, disability rates defined by expert doctors and the rate of responsibility 
for each part of the accident. The outcome variable to be predicted is the amount of compensation for the 
victim of an accident. These seven characteristic variables are appropriate for predicting the final amount of 
compensation because each of them can modify this amount. 

- Example 1: Two people have the same characteristics of salary, disability rates and one of them is very 
old, of course the amount of compensation is lower than the other who is younger. 

- Example 2: Two people have the same age, salary but one of the disability rates is different, the amount 
is greater for the person with the highest disability rates. 

So each variable of these seven variables can modify the final amount of compensation. 
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3.3. Algorithms used 

As we have inputs and an exact result to predict, we are in a case of supervised learning. And as the 
compensation to predict is a continuous number, we treat a regression problem. For that we tried to work 
with three regression algorithms which are linear regression”, ’decision tree”, and random forest”. 


3.3.1. Linear regression 

Linear regression attempts to model the relationship between two variables by fitting a linear 
equation to the observed data. One variable is considered an explanatory variable and the other is considered 
a dependent variable. For example, a modeler may want to relate apartment prices to their area using a linear 
regression model. Before attempting to fit a linear model to the observed data, a modeler must first determine 
whether or not there is a relationship between the variables of interest. This does not necessarily imply that 
one variable causes the other, but that there is a significant association between the two variables. A linear 
regression line has an equation of the form Y = a + bX, where X is the explanatory variable and Y is the 
dependent variable. The slope of the line is b, and a is the intersection (the value of y when x=0). 


3.3.2. Decision tree 

Decision tree learning is one of the predictive modeling approaches used in statistics, data mining, and 
machine learning. It uses a decision tree (as a predictive model) to go from observations on an item (represented 
in branches) to conclusions about the item’s target value (represented in leaves). Tree models in which the target 
variable can take a discrete set of values are called classification trees; in these trees, leaves represent class 
labels and branches represent conjunctions of entities that lead to those class labels. Decision trees in which the 
target variable can take continuous values (usually real numbers) are called regression trees. Decision trees are 
among the most popular machine learning algorithms due to their intelligibility and simplicity. 


3.3.3. Random forests 

Random forests or random decision forests is a set learning method for regression, classification, 
and other tasks that work by building a multitude of decision trees at the time of learning and pulling out the 
class which is the mode of classes (classification) or the mean/mean of the prediction (regression) of 
individual trees. Random decision forests correct the habit of over fitting decision trees to their training set. 
Random forests typically outperform decision trees, but their accuracy is lower than gradient-boosted trees. 
However, the characteristics of the data can affect their performance. 


4. RESULTS AND DISCUSSION 
It is very important to evaluate models in order to know which methods are the best. Indeed, there 
are several metrics that can be utilized to measure the performance of a regression models; among these 
criteria: R-squared, root mean squared error, residual standard error, mean absolute error. In our work, we 
utilize the first criteria in order to compare selected algorithms. Bellow a general definition of this metric: 
R-squared (R2): is defined as (1 - U/V), where U is the residual sum of squares and V is the total sum 
of squares. The best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). 


U=}; (true; — pred;)? (1) 
V=} (true; — mean(true))? (2) 


The test was carried out for each model on the same collected data and under the same conditions. Table 2 
and Figure 2 show the accuracy value for each algorithm. The results show that the use of Machine Learning 
and exactly the algorithm of “random forest” in the field of justice in Morocco is very effective. This is 
proved by the high percentage (91.05%). If we compare this number to the other works and experiences, we 
can say that is can be higher or slow depending on the context and the circumstances. This work, as it is the 
first of its kind in Morocco, will greatly help judges in decision-making in a minimum time to predict 
compensation for victims of road accidents. 


Table 2. Accuracy of algorithms 


Algorithm Accuracy 
Linear regression 63.56% 
Decision tree 82.23% 
Random forests 91.05% 
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Figure 2. Research method 


CONCLUSION AND PERSPECTIVES 
In this paper, we have attempted to perform the first-of-its-kind study in Morocco by working on 


data from Errachidia Court Accident Cases to build a predictive model which will help us solve many court 
cases. The results revealed that the random forests model obtained the highest accuracy 91.05%, followed by 
decision tree with 82.23%, and linear regression with 63.56%. Other research and studies as it has been 
discussed earlier have indicated the efficacity of machine learning in the field of justice. We have other 
projects that we are going to work on in the field of justice in Morocco. But the focus is on other types of 
legal cases like cromes. 
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